Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(source): add NATS source consumer parameters #17615

Merged
merged 5 commits into from
Aug 26, 2024

Conversation

gbto
Copy link
Contributor

@gbto gbto commented Jul 8, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR adds the 20 missing parameters supported in the async_nats crate of the consumer created in NATS sources. The DDL of a NATS source would now look like:

CREATE SOURCE (
...
) WITH (
    connector='nats',
    server_url='{{ env_var("SERVER") }}',
    subject='risingwave.test.source',
    stream='risingwave-test-source',
    scan.startup.mode='earliest',
    connect_mode='user_and_password',
    username='{{ env_var("USER") }}',
    password='{{ env_var("PASSWORD") }}',
    consumer.durable_name='risingwave-test-source',
    consumer.description='desc-test-source',
    consumer.ack_policy='all',
    consumer.ack_wait=10,
    consumer.max_deliver=10,
    consumer.filter_subjects='demo.subject.filter.*',
    consumer.filter_subjects='demo.subject.filter.1,demo.subject.filter.2',
    consumer.replay_policy='instant',
    consumer.rate_limit=100000000000,
    consumer.sample_frequency=100,
    consumer.max_waiting=10,
    consumer.max_ack_pending=10,
    -- consumer.idle_heartbeat=60, not available in async_nats crate
    consumer.max_batch=1000,
    consumer.max_bytes=1000000000,
    consumer.max_expires=3600,
    consumer.inactive_threshold=10000000,
    consumer.memory_storage='false',
    consumer.backoff='10,30,60',
    consumer.num_replicas=1
) FORMAT PLAIN ENCODE JSON

All parameters are optional. If left empty, it falls back to the async_nats default values. All are eventually parsed as strings in RisingWave and casted to the appropriate types before being passed to the build_consumer method.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Release note

Now, all the parameters supported by the async_nats crate are supported in the RisingWave NATS source connector. In particular, users can now configure few important aspects of consumers such as the number of replicas that previously would default to 1, hereby causing Jetstream issues of synchronization across replicas. It also allows setting parameters such as max_waiting or max_ack_pending that can be critical for performance optimization of NATS clusters.

No backward incompatibility would be introduced by these changes.

@gbto gbto requested a review from a team as a code owner July 8, 2024 15:37
@gbto gbto requested a review from MrCroxx July 8, 2024 15:37
@graphite-app graphite-app bot requested a review from a team July 8, 2024 15:39
@BugenZhao
Copy link
Member

Hi. Can you help to rebase the commits first?

@stdrc stdrc changed the title feat/add-nats-source-consumer-parameters feat(source): add NATS source consumer parameters Jul 9, 2024
@gbto gbto force-pushed the feat/add-nats-consumer-parameters branch from e3ce905 to 6f3f8f9 Compare July 9, 2024 08:53
@gbto gbto force-pushed the feat/add-nats-consumer-parameters branch from e044338 to 90a675b Compare July 9, 2024 08:58
@stdrc stdrc requested review from stdrc, tabVersion and yufansong and removed request for a team and MrCroxx July 10, 2024 02:32
Comment on lines 664 to 682
durable_name,
description,
ack_wait: ack_wait.unwrap_or_default(),
max_deliver: max_deliver.unwrap_or_default(),
filter_subject: filter_subject.unwrap_or_default(),
filter_subjects: filter_subjects.unwrap_or_default(),
rate_limit: rate_limit.unwrap_or_default(),
sample_frequency: sample_frequency.unwrap_or_default(),
max_waiting: max_waiting.unwrap_or_default(),
max_ack_pending: max_ack_pending.unwrap_or_default(),
// idle_heartbeat: idle_heart.unwrap_or_default(),
max_batch: max_batch.unwrap_or_default(),
max_bytes: max_bytes.unwrap_or_default(),
max_expires: max_expires.unwrap_or_default(),
inactive_threshold: inactive_threshold.unwrap_or_default(),
memory_storage: memory_storage.unwrap_or_default(),
backoff: backoff.unwrap_or_default(),
num_replicas: num_replicas.unwrap_or_default(),
..Default::default()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These .unwrap_or_default()s have subtly different semantics from ..Default::default(), although in current fact they behave the same because Config is just deriving Default but not manually implementing it.

I'm OK with this but we should be aware of the difference.

sample_frequency: Option<u8>,
max_waiting: Option<i64>,
max_ack_pending: Option<i64>,
_idle_heartbeat: Option<Duration>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this deprecated? May just remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the behavior of the missed idle heartbeat is a bit cryptic with the ‘async_nats’ crate and is a relatively important parameter user because it translates into error messages sent by the stream in the form of “missed idle heartbeat”, which can cause pipelines to fail. Not sure how to handle this one tbh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed from the configuration, since it doesn't exist in the pull consumer config: https://docs.rs/async-nats/latest/async_nats/jetstream/consumer/pull/struct.Config.html

@tabVersion
Copy link
Contributor

Suggest following the practice at #11203
The introduce params are not essential to run NATS connector, so better to move them into a new struct.

Comment on lines 93 to 104
properties.ack_wait.clone().map(|s| {
Duration::from_secs(s.parse::<u64>().expect("failed to parse ack_wait to u64"))
}),
properties.max_deliver.clone().map(|s| {
s.parse::<i64>()
.expect("failed to parse max_deliver to i64")
}),
properties.filter_subject.clone(),
properties
.filter_subjects
.clone()
.map(|s| s.split(',').map(|s| s.to_string()).collect()),
Copy link
Member

@xxchan xxchan Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't look good. We'd better use typed properties, and parse the config when creating the properties struct.
So that CREATE SOURCE will reject invalid config, instead of failing at runtime.

e.g., using #[serde_as(as = "Option<DisplayFromStr>")] or #[serde(deserialize_with = ...)] like other connectors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah indeed that seems much better, even without understanding the error related to config parsing that parsing seemed a bit sketchy. I’ll give it a try in the upcoming days @xxchan thank you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the consumer specific parameters into a separate struct, and added some typed properties based on @xxchan's suggestion

@benjamin-awd
Copy link
Contributor

Just added a commit to address the comments from previous reviewers! Sorry for coming back late on this 🙏

@fuyufjh fuyufjh requested review from stdrc and xxchan August 23, 2024 06:52
Copy link
Member

@stdrc stdrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can approve if others don't have further comments.

Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

src/connector/src/lib.rs Outdated Show resolved Hide resolved
pub max_bytes: Option<i64>,

#[serde(rename = "consumer.max_expires")]
#[serde_as(as = "Option<DurationSeconds<String>>")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

src/connector/src/source/nats/mod.rs Outdated Show resolved Hide resolved
Comment on lines +112 to +114
#[serde(rename = "consumer.ack_policy")]
#[serde_as(as = "Option<DisplayFromStr>")]
pub ack_policy: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can try to deser it into Enum directly, serde can help us reject unrecognized str.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this, but unfortunately couldn't find a clean way to do it -- the enums from the NATS crate do not implement FromStr, so it doesn’t seem possible to deserialize directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, I will try refactor it later. Thanks for your contribution.

Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Copy link
Member

@yufansong yufansong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fuyufjh fuyufjh added this pull request to the merge queue Aug 26, 2024
Merged via the queue into risingwavelabs:main with commit 220fded Aug 26, 2024
28 of 30 checks passed
zwang28 pushed a commit that referenced this pull request Aug 27, 2024
Co-authored-by: benjamin-awd <benjamindornel@gmail.com>
(cherry picked from commit 220fded)
kwannoel pushed a commit that referenced this pull request Aug 27, 2024
Co-authored-by: benjamin-awd <benjamindornel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants